Chunking Using Conditional Random Fields in Korean Texts

نویسندگان

Yong-Hun Lee

Mi-Young Kim

Jong-Hyeok Lee

چکیده

We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In agglutinative languages such as Korean and Japanese, a rule-based chunking method is predominantly used for its simplicity and efficiency. A hybrid of a rule-based and machine learning method was also proposed to handle exceptional cases of the rules. In this paper, we present how CRFs can be applied to the task of chunking in Korean texts. Experiments using the STEP 2000 dataset show that the proposed method significantly improves the performance as well as outperforms previous systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction

In this paper we describe our system (DTSim) submitted at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity (iSTS). We participated in both gold chunks category (texts chunked by human experts and provided by the task organizers) and system chunks category (participants had to automatically chunk the input texts). We developed a Conditional Random Fields based chunker and applied r...

متن کامل

Chunking in Turkish with Conditional Random Fields

In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different level...

متن کامل

Fast Full Parsing by Linear-Chain Conditional Random Fields

This paper presents a chunking-based discriminative approach to full parsing. We convert the task of full parsing into a series of chunking tasks and apply a conditional random field (CRF) model to each level of chunking. The probability of an entire parse tree is computed as the product of the probabilities of individual chunking results. The parsing is performed in a bottom-up manner and the ...

متن کامل

Chinese Chunking based on Conditional Random Fields

In this paper, we proposed an approach for Chinese chunking based on the Conditional Random Fields model (CRFs). For sequence labeling, CRFs has advantages over generative models. Furthermore, Chinese chunking is a difficult sequence labeling task. This paper describes how to use CRFs for Chinese chunking via capturing the arbitrary and overlapping features. We defined different types of featur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Chunking Using Conditional Random Fields in Korean Texts

نویسندگان

چکیده

منابع مشابه

تعیین مرز و نوع عبارات نحوی در متون فارسی

DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction

Chunking in Turkish with Conditional Random Fields

Fast Full Parsing by Linear-Chain Conditional Random Fields

Chinese Chunking based on Conditional Random Fields

عنوان ژورنال:

اشتراک گذاری